正如GPT-3和T5所证明的那样,随着参数空间变得越来越大,变压器具有能力。但是,对于需要大量知识的任务,非参数存储器允许模型在计算成本和GPU内存需求的次线性增加中急剧增长。诸如RAG和Realm之类的最新模型已将检索引入条件生成。这些模型结合了从一系列语料库中的神经初始检索。我们基于这一研究,提出了RE2G,该研究将神经初始检索和重新融合到基于巴特的序列到序列的生成中。我们的阅读方法还允许从无与伦比分数的来源合并结果,从而实现BM25和神经初始检索的合奏。为了训练我们的系统端到端,我们引入了一种新颖的知识蒸馏变体,以在目标序列输出上仅使用地面真理来训练初始检索,重读者和生成。我们在四个不同的任务中发现了很大的收益:零击插槽填充,问答,事实检查和对话,相对增长了9%至34%,比以前的苏格兰短裙排行榜上的最先前的排行榜相比。我们将代码作为开源提供,网址为https://github.com/ibm/kgi-slot-filling/tree/re2g。
translated by 谷歌翻译
问题回答(QA)对知识库(KBS)的挑战是充满挑战的,因为所需的推理模式多样化,本质上是无限的,类型的推理模式。但是,我们假设以大型KB为基础,以回答各自子图中各个实体的查询类型所需的推理模式。利用不同子图的本地社区之间的这种结构相似性,我们引入了一个半参数模型(cbr-subg),(i)一个非参数组件,每个查询,每个查询,都会动态检索其他类似的$ k $ - $ - $ - $ - near-neart-tebrienk(KNN)培训查询以及查询特定的子图和(ii)训练的参数组件,该参数分量可以从KNN查询的子图中识别(潜在的)推理模式,然后将其应用于目标查询的子图。我们还提出了一种自适应子图收集策略,以选择特定于查询的compact子图,从而使我们可以扩展到包含数十亿个事实的完整freebase kb。我们表明,CBR-SUBG可以回答需要子图推理模式的查询,并在几个KBQA基准上的最佳模型竞争性能。我们的子图收集策略还会产生更多紧凑的子图(例如,webQSP的尺寸减小55 \%,而将答案召回的召回率增加4.85 \%)\ footNote {代码,模型和子码头可在\ url {https://github.com上获得。 /rajarshd/cbr-subg}}。
translated by 谷歌翻译
Participants in political discourse employ rhetorical strategies -- such as hedging, attributions, or denials -- to display varying degrees of belief commitments to claims proposed by themselves or others. Traditionally, political scientists have studied these epistemic phenomena through labor-intensive manual content analysis. We propose to help automate such work through epistemic stance prediction, drawn from research in computational semantics, to distinguish at the clausal level what is asserted, denied, or only ambivalently suggested by the author or other mentioned entities (belief holders). We first develop a simple RoBERTa-based model for multi-source stance predictions that outperforms more complex state-of-the-art modeling. Then we demonstrate its novel application to political science by conducting a large-scale analysis of the Mass Market Manifestos corpus of U.S. political opinion books, where we characterize trends in cited belief holders -- respected allies and opposed bogeymen -- across U.S. political ideologies.
translated by 谷歌翻译
Spoken language understanding (SLU) tasks have been studied for many decades in the speech research community, but have not received as much attention as lower-level tasks like speech and speaker recognition. In particular, there are not nearly as many SLU task benchmarks, and many of the existing ones use data that is not freely available to all researchers. Recent work has begun to introduce such benchmark datasets for several tasks. In this work, we introduce several new annotated SLU benchmark tasks based on freely available speech data, which complement existing benchmarks and address gaps in the SLU evaluation landscape. We contribute four tasks: question answering and summarization involve inference over longer speech sequences; named entity localization addresses the speech-specific task of locating the targeted content in the signal; dialog act classification identifies the function of a given speech utterance. We follow the blueprint of the Spoken Language Understanding Evaluation (SLUE) benchmark suite. In order to facilitate the development of SLU models that leverage the success of pre-trained speech representations, we will be publishing for each task (i) annotations for a relatively small fine-tuning set, (ii) annotated development and test sets, and (iii) baseline models for easy reproducibility and comparisons. In this work, we present the details of data collection and annotation and the performance of the baseline models. We also perform sensitivity analysis of pipeline models' performance (speech recognizer + text model) to the speech recognition accuracy, using more than 20 state-of-the-art speech recognition models.
translated by 谷歌翻译
Language models are widely deployed to provide automatic text completion services in user products. However, recent research has revealed that language models (especially large ones) bear considerable risk of memorizing private training data, which is then vulnerable to leakage and extraction by adversaries. In this study, we test the efficacy of a range of privacy-preserving techniques to mitigate unintended memorization of sensitive user text, while varying other factors such as model size and adversarial conditions. We test both "heuristic" mitigations (those without formal privacy guarantees) and Differentially Private training, which provides provable levels of privacy at the cost of some model performance. Our experiments show that (with the exception of L2 regularization), heuristic mitigations are largely ineffective in preventing memorization in our test suite, possibly because they make too strong of assumptions about the characteristics that define "sensitive" or "private" text. In contrast, Differential Privacy reliably prevents memorization in our experiments, despite its computational and model-performance costs.
translated by 谷歌翻译
Finding an initial noise vector that produces an input image when fed into the diffusion process (known as inversion) is an important problem in denoising diffusion models (DDMs), with applications for real image editing. The state-of-the-art approach for real image editing with inversion uses denoising diffusion implicit models (DDIMs) to deterministically noise the image to the intermediate state along the path that the denoising would follow given the original conditioning. However, DDIM inversion for real images is unstable as it relies on local linearization assumptions, which result in the propagation of errors, leading to incorrect image reconstruction and loss of content. To alleviate these problems, we propose Exact Diffusion Inversion via Coupled Transformations (EDICT), an inversion method that draws inspiration from affine coupling layers. EDICT enables mathematically exact inversion of real and model-generated images by maintaining two coupled noise vectors which are used to invert each other in an alternating fashion. Using Stable Diffusion, a state-of-the-art latent diffusion model, we demonstrate that EDICT successfully reconstructs real images with high fidelity. On complex image datasets like MS-COCO, EDICT reconstruction significantly outperforms DDIM, improving the mean square error of reconstruction by a factor of two. Using noise vectors inverted from real images, EDICT enables a wide range of image edits--from local and global semantic edits to image stylization--while maintaining fidelity to the original image structure. EDICT requires no model training/finetuning, prompt tuning, or extra data and can be combined with any pretrained DDM. Code is available at https://github.com/salesforce/EDICT.
translated by 谷歌翻译
Mixup is a popular data augmentation technique based on creating new samples by linear interpolation between two given data samples, to improve both the generalization and robustness of the trained model. Knowledge distillation (KD), on the other hand, is widely used for model compression and transfer learning, which involves using a larger network's implicit knowledge to guide the learning of a smaller network. At first glance, these two techniques seem very different, however, we found that ``smoothness" is the connecting link between the two and is also a crucial attribute in understanding KD's interplay with mixup. Although many mixup variants and distillation methods have been proposed, much remains to be understood regarding the role of a mixup in knowledge distillation. In this paper, we present a detailed empirical study on various important dimensions of compatibility between mixup and knowledge distillation. We also scrutinize the behavior of the networks trained with a mixup in the light of knowledge distillation through extensive analysis, visualizations, and comprehensive experiments on image classification. Finally, based on our findings, we suggest improved strategies to guide the student network to enhance its effectiveness. Additionally, the findings of this study provide insightful suggestions to researchers and practitioners that commonly use techniques from KD. Our code is available at https://github.com/hchoi71/MIX-KD.
translated by 谷歌翻译
Many self-supervised speech models, varying in their pre-training objective, input modality, and pre-training data, have been proposed in the last few years. Despite impressive empirical successes on downstream tasks, we still have a limited understanding of the properties encoded by the models and the differences across models. In this work, we examine the intermediate representations for a variety of recent models. Specifically, we measure acoustic, phonetic, and word-level properties encoded in individual layers, using a lightweight analysis tool based on canonical correlation analysis (CCA). We find that these properties evolve across layers differently depending on the model, and the variations relate to the choice of pre-training objective. We further investigate the utility of our analyses for downstream tasks by comparing the property trends with performance on speech recognition and spoken language understanding tasks. We discover that CCA trends provide reliable guidance to choose layers of interest for downstream tasks and that single-layer performance often matches or improves upon using all layers, suggesting implications for more efficient use of pre-trained models.
translated by 谷歌翻译
资源说明框架(RDF)和属性图(PG)是表示,存储和查询图数据的两个最常用的数据模型。我们提出了表达推理图存储(ERGS) - 构建在Janusgraph(属性图存储)顶部的图存储,该图还允许存储和查询RDF数据集。首先,我们描述了如何将RDF数据转换为属性图表示,然后描述将SPARQL查询转换为一系列Gremlin遍历的查询翻译模块。因此,开发的转换器和翻译器可以允许任何Apache TinkerPop符合图形数据库存储和查询RDF数据集。我们证明了使用JanusGraph作为基本属性图存储的建议方法的有效性,并将其性能与标准RDF系统进行比较。
translated by 谷歌翻译
在文本中提取时间关系是自然语言理解的一个至关重要但充满挑战的问题。根据事件之间的距离,模型必须学会从事件对周围的本地和全局环境中进行不同的信息以进行时间关系预测。学习如何融合这些信息已证明对基于变压器的语言模型具有挑战性。因此,我们介绍了mulco:多尺度对比的共同训练,这是一种更好地融合本地和全球情境化特征的技术。我们的模型使用基于BERT的语言模型编码本地上下文和图形神经网络(GNN)来表示全局文档级句法和时间特征。与以前的最先进方法不同,该方法在多视图功能上使用简单的串联或使用复杂的强化学习方法选择最佳句子,我们的模型Co-Trains GNN和BERT模块使用多规模的对比度学习目标。 GNN和BERT模块通过将GNN多层多跳子图(即,全局上下文嵌入)和BERT输出(即局部上下文嵌入)进行对比,从而学习了协同参数化。我们从经验上证明,与当前的最新技术相比,Mulco提供了改进的使用Bert和GNN编码的本地和全球环境的能力。我们的实验结果表明,Mulco在几个时间关系提取数据集上实现了新的最新结果。
translated by 谷歌翻译